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Abstract 

Small subgraph counts can be used as summary statistics for large 
random graphs. We use the Stein-Chen method to derive Poisson 
approximations for the distribntion of the number of subgraphs in the 
stochastic block model which are isomorphic to some hxed graph. We 
also obtain Poisson approximations for subgraph counts in a graphon- 
type generalisation of the model in which the edge probabilities are 
(possibly dependent) random variables supported on a subset of [0,1]. 

Our results apply when the fixed graph is a member of the class of 
strictly balanced graphs. 
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1 Introduction 

Small subgraph counts can be used as summary statistics for large random 
graphs; indeed in some graph models they appear as sufficient statistics, 
see na. Moreover, many networks are conjectured to have over- or under¬ 
represented motifs (small subgraphs), see for example [I9]. Statistics based 
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on small subgraph counts can be used to compare networks, as in [31IM], To 
determine which small subgraphs are unusual, assessing the distribution of 
such motifs is key. While [22] gives the mean and variance for some common 
random graph models, [22] does not derive a distributional approximation. 

In this paper, we address the issue of such a distributional approxima¬ 
tion for a large class of models which include stochastic block models and 
a graphon model but also models with random edge probabilities, provided 
that the edge probabilities display some local dependence, which will be made 
clearer in Section 3. 

The stochastic block model (SBM) was introduced originally for directed 
graphs by [13] and generalised to other graphs by [20]; it is also called Erdos- 
Renyi Mixture Model in HDl, and in theoretical computer science it is called 
the Planted Partition Model [9]. It has a wide range of applications, see 
for example [D El HOI min], and ini for a recent survey. The model is 
dehned as follows. Consider an undirected random graph on n vertices, with 
no self-loops or multiple edges, in which the vertices are spread among Q 
hidden classes with respective proportion vector / = (/i,..., fq). The class 
label of a vertex is drawn from a multinomial distribution Wl(l,/), and 
class assignments are independent of each other. Edges Yij are independent 
conditionally on the class of the vertices, and the edge probability depends 
only on the classes of the vertices: 

¥{Yij = l\i e a,j E b) = TTafi- 

We shall denote this model by SBM{n,7i, f). If 7Ta,b = P for all a and b, 
the SBM reduces to the classical Erdos-Renyi random graph model, which 
we denote by ^(n,p). In this paper, it is assumed that vr and / are known, 
and that /i,...,/q > 0; for estimating these quantities see, for example, 

[imsiia]. 

For a hxed graph G, it is known (see, for example, [5], Theorem 5.B) 
that the distribution of the number copies of G in the ^{n,p) model is well 
approximated by an appropriate Poisson distribution if G is a member of the 
class of strictly balanced graphs (dehned below) as long as p is not too large. 
In fact, [5] give explicit bounds on the rate of convergence in the Poisson 
approximation. 

In this paper, we consider a generalisation of the Poisson approximation 
to the stochastic block model. We obtain explicit bounds for the rate of 
convergence, and consider both the cases that the edge probabilities 7ia,b 
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are constant and that they are themselves random variables supported on 
a subset of [0,1]. This paper therefore contains the following two features 
that have not appeared together before: the random graphs may have local 
dependence or inhomogeneity, or both, and the approximation is quantitative 
(as opposed to only asymptotic). The second feature has appeared without 
the hrst (for example in [5]) and the hrst feature has appeared without the 
second (for example in [7]), at least for the case of cycles in the case of 
constant average degree stochastic block models. It is the combination of 
both features that is novel. 

When the edge probabilities are themselves random variables then we 
assume that they are only locally dependent, in the sense that each edge has a 
relatively small number of other edges so that their random edge probabilities 
are not independent. As an example the vertices may have some exogeneous 
characteristics such as geographical location which influence the probability 
of an edge to exist, but only locally. 

The latter case is related to a graphon model, where edge probabilities 
only depend on those edge probabilities where the edges share a vertex. A 
graphon is represented by a measureable function h : [0,1]^ —)■ [0,1]. A 
graphon model constructs a random graph on n vertices by assigning in¬ 
dependent f/[0,1] variables to each vertex. Conditional on these uniform 
random variables, all edges are independent, and the probability of an edge 
between vertices u and v is given by h{Uu, Uy). These graphs appear as limits 
of exchangeable graphs; see, for example, [21 [mini. They are a special case 
of inhomogeneous random graph models as considered in [7]. 

Setting the scene for counting copies of graphs G, let Kn be the complete 
graph with n edges and ( 2 ) edges. Let G C Kn be a hxed graph with v{G) 
vertices and e{G) edges; let V{G) denote the vertex set and E{G) its edge 
set. To avoid trivialities, we assume that e(G) > 1 and that G has no isolated 
vertices. We shall be particularly interested in the case that G is a member 
of the class of strictly balanced graphs, which we now dehne according to [5]. 
Let 


d{G) 


e{G) 

v{Gy 


Then the graph G is said to be strictly balanced if d{H) < d{G) for all 
subgraphs H <ZG. 

Let T denote the set of n(G)-tuples of elements from {1,... ,n}. Then, 
a G T is a possible position for the subgraph G, and there are such 
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positions. To account for re-labelling of vertices, let RaiG) denote the set of 
all subgraphs of the complete graph on the n(G)-tuple a which are isomorphic 
to G (a similar notation was introduced in Picard et ah [22]). For any a G F, 
the number of elements in the set Ra{G) is given by 


P(G) 


a(G') 


( 1 . 1 ) 


where a{G) is the number of elements in the automorphism group of G. 

Now, let ^ = (y,S) be a random graph on n edges. For a G F and 
G' G Ra{G), let Xa{G') be the indicator random variable for the occurrence, 
at the n(G)-tuple a, of a subgraph G' which is isomorphic to G. We shall let 
W denote the total number of copies of G in the random graph 


= E (1.2) 

a&r G'Gilc(G) 

Here, copies are counted as opposed to induced copies where not only 
all edges of the graph have to appear, but also no edge which is not in the 
graph is allowed to appear in the copy. For example, the complete graph Kn, 
n > 3, contains (n — l)!/2 copies, but no induced copy, of an n-cycle. 

To illustrate our notation, consider counting the number of isomorphic 
copies of the path on three vertices, denoted by G, in a graph ^ with vertex 
set {1, 2, 3,4}. We hrst construct the vertex set F of all 3-tuples from the set 
{1,2, 3,4}. For the set {1,2,3} G F, we consider the indicators W{i^2,3}(g0) 
where G' G i?{i, 2 , 3 }(g)- The set i?{i, 2 , 3 }(G) contains three non-redundant 
ways G[, G 2 , G'^ that a copy of G can occur on {1, 2, 3}, these being if edges 
{1,2} and {1,3} are present; edges {2,1} and {2,3} are present; or edges 
{3,1} and {3, 2} are present. We count the number of occurrences of G in 
{1, 2, 3} using the indicators X{i^2,3}(gi)) * = 1)2, 3, and we then repeat this 
procedure for all a G F. Since |i?{i, 2 , 3 }(G)| = 3 and |F| = ( 3 ) = 6 , there can 
be at most 18 copies of G in the graph For example, if is the circle graph 
with edge set {{1, 2}, {2, 3}, {3,4}, {4,1}}, then X{i, 2 , 3 }({l, 2}, {1, 3}) = 0, 
X{i, 2,3}({2, 1}, {2, 3}) = 1 and X{i,2,3}({3,1}, {3, 2}) = 0. 

In the stochastic block model SBM{n,7r, f), the conditional occurrence 
probability of an isomorphic copy G' of the subgraph G on a = (H,..., iv{G)) 
given the class of each vertex is 

1P(Wq,(G ) 1 I H £ Cl, . . . , iv{G) £ C^(G)) '^Cu,Cv- 

l<u<v<v{G)\{u,v)^E(G) 
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The occurrence probability of an isomorphic copy G' of G is then 


Q 


mg)=ex„(g')= 5^ /«/«••-/mo, n 



(1.3) 


Note that for any a,/3 G T and G' G Ra{G), G” G RpiG) we do indeed have 
EXq(G') = EX_g(G"). We therefore have that 



(1.4) 


In this paper, we use the Stein-Chen method for Poisson approximation, 
introduced by [8], to assess the distributional distance between C{W) and 
the Po{X) distribution when the fixed graph G is is a member of the class 
of strictly balanced graphs. This discrepancy is measured using the total 
variation distance, which for non-negative, integer-valued random variables 
U and V is given by 


dTvmU),C{V)) = sup |P(t/G A)-P(I/G A)|. 


ACZ+ 


In deriving bounds on the total variation distance, we exploit the local de¬ 
pendence structure of the indicators Xa{G'). To this end, for each a G T, we 
introduce a set Aa which can be viewed as a dependency neighbourhood of a. 
In the SBM, as class assignments are independent and the edge probabilities 
are given, we can take 


Aa = {/3 G r: |a n /5| > 1}. 


Here, Aa is a dependency neighbourhood of a in the sense that if |a n /9| = 
0, then Xa(G') and Xp{G") are independent for any G' G Ra{G), G" G 
Ri 3 {G). In Section 3, the edge probabilities are random variables, in which 
case finding a suitable dependency neighbourhood Aa is more involved. With 


>fo(G') = E E 


G"gRc{G) 


and 


OaiG') = Xa{G'){r]a{G') - Xa{G')), 


(1.5) 
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a simple corollary of Theorem 1 in |1], or of Theorem l.A in j5] is that 

dTv{C{W),Po{\)) < E E {EXa{G')E7]a{G')+Ee^{G')}. 

aer G'£Ra{G) 

( 1 . 6 ) 

Thus bounding the total variation distance between the distribution of the 
subgraph counts in the SBM and the Po{X) distribution reduces to bounding 
the expectations on the right-hand side of fll.6l) . We shall prove our Poisson 
approximations for subgraph counts 1 Theorems 12 .1 1 and 13 .1 1 and Corollarv l4.ll) 
using this approach. The Poisson approximation results of these theorems are 
valid when the hxed graph G is strictly balanced and the edge probabilities 
TTa^b are not too large. These theorems generalise Theorem 5.B of [5], which 
asserts that a Poisson approximation is valid in the ^{n,p) model under the 
same conditions. 

The Poisson approximation is valid under these conditions in the SBM for 
exactly the same reason as it is in the ^{n,p) model; if G is strictly balanced 
and the iiafi are not too large, with high probability the copies of G are vertex 
disjoint and the Xa{G) are close to being independent. Thus, W is the sum 
of a large number of almost independent indicators with small means, and a 
Poisson approximation is valid. In the ^{n,p) model, the Poisson approxi¬ 
mation breaks down if G is not strictly balanced [23], although Compound 
Poisson approximations may still be valid for certain classes of subgraphs; 
see [25] . For this reason, we restrict our attention to strictly balanced graphs. 

The rest of the paper is organised as follows. In Section 2, we use the 
Stein-Chen method to derive a Poisson approximation for the number of 
subgraphs in the SBM which are isomorphic to some hxed graph from the 
class of strictly balanced graphs. In Section 3, we consider a generalisation 
of this problem in which the edge probabilities are now (possibly locally 
dependent) random variables supported on a subset of [0,1]. Again, we 
derive a Poisson approximation for the number of copies of a hxed subgraph 
in this model. Section 4 gives a Poisson approximation of small graph counts 
in the graphon model. 
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2 Poisson approximation of subgraph counts 
in the stochastic block model 


In this section, we obtain a Poisson approximation for the nnmber of sub¬ 
graphs in the SBM which are isomorphic to a fixed graph from the class 
of strictly balanced graphs. Before stating this result, we introduce some 
notation. Let 


a{G) 


mm 


e{G) 


H v{G) 


e(H) 

v(H) 


( 2 . 1 ) 


and 


7 (G) = mm{d{G)v{H) - e{H)) = mmv{H) ■ (d(G) - d{H)), (2.2) 

H H 

where the minima are taken over all all non-empty subgraphs H C G without 
isolated vertices. It is worth noting that the graph G is strictly balanced if 
7 (G) > 0 or q;(G) > d(G); see [5]. Also, let 


TT* = max TTab 
l<a<b<Q 


(2.3) 


denote the maximum edge probability. 


Theorem 2.1. Suppose that G is a strictly balanced graph. Then, with the 
notation fll.dp . fll.ip . fl2.ip . fl2.2p and fl2.3p . 


dTvmw),Po{x)) < (1 


.-A 


)piG) 


n(G)! 




v(G)-l 


s=2 


v{G)\n^^^GyT^*yiG,s)>^ 

. s ) {v{G)-s)\ /’ 


(2.4) 


where 

k(G,s) = max(e(G) — sd{G) + 7 (G), (n(G) — s)a(G)). (2.5) 


Proof. We establish our bound by bounding the right-hand side of inequality 
fll.bp . starting with ^^e dependence 

set Aq = {/3 G r: |a fl /3| > 1 }, 


|A„| < v{G) 


n 

v{G) - 




n(G)! 


( 2 . 6 ) 
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It is now clear from fll.3p and fl2.6p that 


E,„(G') = EJfj(G') = \AMGMG) 


< 


P&Aa G'eRciG) 

P(GM^ .(G)-l 

n(G)! 


/x(G). 


(2.7) 


The more involved part of the proof, where the assumption of strictly 
balancedness comes into play, is to bound the expectation K9a{G') from 
(ll.5j) . When a and {3 have considerable overlap, then EXq,(G')X/ 3 (G") may 
be large compared to EXq,(G') - but there are not many /S’s which have 
considerable overlap with a. To take account of the overlap, we partition 
into sets {T® }i<s<^(g), where T® = {/3 G T: |a n /3| = s}. These sets can be 
bounded above by 

ir* I < i 

“'-V s )\v(G)-s) - \ s )(v(G)-,y: 

Now, recalling (ll.5p . 

E0„(G') = ZEE EX„(G')X^(G") + ^ EX„(G')X„(G"). 

s=l /3erj, G"&Rp{G) G"£Rp{G) 

gvg" 


To bound the expectations in the above expression, we consider the cases of 
different overlap s separately. 

Firstly, for G' ^ G", and for s = n(G), so that a = (3, there must be 
at least 1 edge present in G" which is not in G'. Due to the conditional 
independence of the edges, for any edge indicator Yij which is not included 
in X„(G'), 


Q 

P(r„- = 1|X„(G') = 1) = ^ e b\X^{G') = 1) < ttT (2.8) 

a,b=l 

Hence 

EX„(G')X^(G") < /i(G)7r* for /3 G 

Next, we consider the case s = 1, in which a and (3 only intersect at 
a single vertex. As a result, G' and G" cannot share an edge. Using the 







generalisation of fl 2 . 8 p that for any set of edges A which does not overlap 
with the edges in Xa(G'), 

= 1 , it,j) e A\X^{G') = 1) < (2.9) 


it follows that 


EX„(G')^/ 3 (G") < for /3 e r^. 

Finally, we consider the case 2 < s < v{m) — 1. We shall derive two 
bounds for the expectation EXq,(G')X^(G"). 

There are e(G) edges from the subgraph G' given on a and we now 
consider the number of additional edges resulting from the subgraph G" 
given on jS. Here the underlying graph is the complete graph. Con¬ 
sider the subgraph H of the intersection graph of G' and G" induced on 
the intersection of a and /3, which has vertex set V{H) = a r\ (3 and edge 
set E{H), for which e G E{H) if and only if e G E{G') fl E{G"). Due to 
the fact that |a fl /3| = s, we have v{H) = s, and, because G' is strictly 
balanced, it must be the case that d{H) < d{G) (we have d{G') = d(G), as 
G and G' are isomorphic), and so e{H) < sd{G). Recalling 02 . 21 ) . we have 
e{H) +'y{G) < sd{G), that is e{H) < sd{G) — 7 (G). Thus, there are at least 
e(G) — {sd{G) — 7 (G)) = e(G) — sd{G) + 7 (G) edges from G" which are not 
in the subgraph G', and so the union graph of G' and G" on a VJ (3 has at 
least 2e(G) — sd{G) -|- 7 (G) edges. 

Alternatively, with a{G) as in 02 . 11 ) . 

e(G)-em = WG)-.(«))d|^ 

> {v{G) - v{H))a{G) 

= {v{G) - s)a{G), 

and therefore there are at least e(G) -|- (f (G) — s)a(G) edges in the union 
graph of G' and G" on aU (3. This bound in connection with 02.9p leads to 
the bound 


EX^{G')Xf,{G") < p(G)( 7 r*)"(«’*) for /3 G T^, 
where k{G, s) = max(e(G) — sd{G) + 7 (G), (w(G) — s)«(G)). Collecting the 
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bounds gives 


E0„(G') < fi{G) < 


E E 


TT 


G"€R^{G) 

G'+G” 


v{G)-\ 

0eri s=2 /3Gr« 

G"£Rp{G) G"&Rp{G) 


\k.{G,s) 




u(G)! 

/u(G)\ 

^ i - i (u(G) - s)! 


s=2 


( 2 . 10 ) 


Finally, substituting (12.7p and (I2.10p into (II.Op and recalling (II.4p yields 

(EH). □ 

Remark 2.2. 1. The stochastic block model structure enters the proof 

only through the expression for p{G) as well as the bound (12.Op . 


2. Theorem \2.1\ generalises Theorem 5.B of for the Erdos-Renyi ran¬ 
dom graph model to the Stochastic block model. When we take 7ia,b = P 
for all a, b we recover the same rate of convergence as that given by 
Theorem 5.B of Indeed the graph combinatorics arguments in our 
proof are strongly related to those in the proof of Theorem 5.B of It 
should, however, be noted that our proof uses a local coupling approach 
whereas the proof in n uses size bias couplings. 


3. To assess the behaviour of the bound it may be advantageous to use 
the bound 1 — e~^ < min(l, A). Heuristically, a Poisson approximation 
should hold when p{G) is small. When p is so small that A < 1 then 
the factor 1 — e~^ is beneficial. 


4 . For a strictly balanced graph, 

k{G, s) > (n(G) - s)a(G') > (nPG) - s)d{G) (2.11) 

for all s = t),... ,v{G) — 1. Let Ak = k{G, s) — {v{G) — s)d{G). Then 
Ak > 0. Using fl2.9p we can bound p{G) < (vr*)®*^*^). If n{7r*Y^^'^ is 
bounded by c as n ^ 00 then A < and < 

_ Moreover the bound in Theorem \2.1\ is then of order 

_ A.K 

0(min(n“^,n as n ^ 00 , with proportion vector f and graph G 

fixed. 
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5. Theorem \2.1\ is not an asymptotic result but an explicit hound, which 
may or may not he small. 

6 . The result of Theorem ] 2. 1\ is perhaps most interesting when the limiting 

Po{X) distribution is non-degenerate in the limit n oo. Suppose that 
there exist universal constants c and C such that < Ha^b ^ 

for all a,b. Then using the ineguality ^ 

1 < k < m and (El we obtain 


P{G) 

v{G)<G) 


< A < 


P{G) ^e{G) 
v{G)\ 


Moreover, 

dTv{C{W),PoW) < min (l, 

+min(Al, 5 )|, (2.12) 


where 


A = (^l ^ C'^(G)y(G)-l^l-a(G)/d(G). 

B = C''{G)+AG)(^ij^(j-d{G)y{G)-l^-AG)/d{G)_ 

Example 2.3. We now use (\2.12h to obtain Poisson approximations for the 
number of copies of the following fixed graphs with n > 3 vertices in the 
SBM{n, TT, /) model. We consider the following strictly balanced graphs on 
V vertices each: 

a tree on the v vertices, with n — 1 edges; 

G 2 ,v the cycle graph on the v vertices (with v edges); 

Gs^y the complete graph on v vertices with one edge removed; 

G 4 ^y Ky, the complete graph on v vertices. 

In order to apply (12.121) . we must compute the quantities d{G), a{G) and 
7 (G) for each graph G. These quantities are easy to compute, and the values 
are given in Table [H If for a given graph G there exist universal constants 
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c and C such that < Tia,b < for all a, b, then a bound 

for the total variation distance between the distribution of W and the -Po(A) 
distribution now follows directly from fl2.12p . In Table [2l for each graph G, 
we give the resulting bounds on the rate of convergence in terms of n. For 
this rate of convergence it is assumed that the proportion vector / = /(n) 
remains constant asn ^ oo, and that G does not change with n. We also give 
a scaling of the edge probabilities that is required to given a non-degenerate 
A in the limit. This scaling is given in terms of n* = maxi<a<fe<Q iiafi (note 
that all the Tia,h are of the same order). Table [2] shows that the bound on 
the rate of convergence for the tree graph may be considerably larger than 
the bound on the rate of convergence in the cycle graph. 



Table 1 

: Values of d{G), a 

[G) and 7 (G) 

Graph G 

d{G) 

a{G) 

7(G) 


v—1 

V 

h-i)-i _ 1 

v-2 

{v 2 ) = ^ 

G2,v 

1 

v—1 

V-2 

1 


(d+ 1 )('U— 2 ) 

( 2 )-!-! _ v^-v-4 

1/3 if n = 3 and 

2v 

v-2 2(v-2) 


-((•)- 2 ) = 1 




if n > 4 


V—1 

2 

( 2 )-! _ .+1 
v-2 2 

- (0 - 1 ) = 1 


r 

?able 2: Scaling and bounds on the rate of convergence 

Graph 

Scaling 

dTvmw),Po{x)) 

Gi,, 

TT* = Gn-Vh-1) 

O(n-hh-b) = 0{{Tr*yG) 

to 

Ti* = Gn-^ 

O(n-i) = 0(7r*) 

G3,„ 

TT* = (772,-2Vh+l)h-2) 

0(n“^/^) = 0((7r*)^/^) if n = 3 
and 

if n > 4 

G4,.i; 

TV* = Gn-2/h-b 

0(n-2/h-b) = 0{7r*) 
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3 Subgraph counts in graph models with ran¬ 
dom edge probabilities 

In this section, we consider a model in which the edge probabilities are them¬ 
selves random variables. Let I = {u, v \ 1 < u < v < n\ he the index set 
of potential edges and for (m, v) E I let £ [0,1] be random vari¬ 

ables; given = 9u,v the edge indicator is Bernonlli distribnted with 
parameter 9u,v Conditional on the edge probabilities {0^,,; : {u,v) El} the 
edge indicator variables {Yu^y : {u,v) El} are assnmed to be independent. 

We shall assnme a local dependence structure for the edge probabilities: 
for any (u, v) E I there is a set By^y such that for any edge set 8^ the collection 
of random variables {0„,^ : {u^v) E 8} is independent of the collection of 
random variables {Qx,y ■ {x,y) E {U{u,v)esBu,v) }• Moreover, we assume that 
Bu,v is of the form 

Bu,v = E I : X E M{u,v),w E N(u,v)}. 

We shall often think of N(u, v) as being a small set compared to {1,..., n}, 
whereas M{u,v) could be a large set. We denote the least upper bound on 
{|iV(M, v)\, {u,v) E 1} hj g so that 

\N{u,v)\ < g 

for all {u,v). For independent edges, if n < n we take M{u,v) = {«} and 
N{u, v) = {n} so that Bu,v = {{u, n)} and g = 1; for graphon models, we can 
take M(m, n) = {1,..., n} and N{u, v) = {u, v} so that By^y = {(x, tc) G / ; 
w E {M,n}}, and g = 2. Other examples could include exogenous covariates 
such as geographic location; edge random variables could be independent if 
they are further than a certain geographic distance away from each other. 

The dependency structure is now more involved. For a = (ai, ..., a^(G)) 
let 8{a) = : i 7 ^ j,i,j G {ai,... ,a^(G)}} denote the set of edges of the 

complete graph on a. Then the set 

= {/3 e F; \8{I3) n {U^u,v)eE{c.)Bu,v) \ > 1} (3.1) 

is a dependency neighbourhood of a. In particular, ii (3 ^ then {Qx,y ■ 
{x,y) E 8{I3)} is independent of {0^^,^ : {u,v) E 8{a)}. We can bound the 
size of this dependency neighbourhood as follows. For (3 E ai least one of 
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the vertices of /3 is in a set N{u,v) for some u,v E S{a). Each of these sets 
N{u, v) has at most g elements. Hence 


|.4„|<«„(G)(^(g;_l). (3.2) 

For a set of edges 7 = { 71 , 72 ,... ,'jk} we introduce the notation V ( 7 ) for 
the set of vertices which are endpoints in 7 , so that |H( 7 )| < 2 | 7 |. We let 


( k V 

TT = 1 1 TT = 1 

"*"7 -1 


(3.3) 


With /i(G) = EX„(G') and 

A := EW = 


n 

v{G) 


p{G)KG) 


we obtain the following variant of Theorem 12.11 


Theorem 3.1. Assume that the na,b o-fe arbitrary random variables supported 
on a subset of [0,1]. Let nk,v,s be as in fl3.3p . Suppose that G is a strictly 
balanced graph. Then 


dTv{G{W),Po{X))<{l 


e-^)p{G)g 


2 ^ 

n(G)! 


n 


v{G)-l 


^e{G),e{G),l + 




s=2 


(V{G)\ V^(G,s),e(G),. 1 

'v . ) {v{G)-s)\ /’ 


(3.4) 


where k,{G,s) is as in Theorem \2.1\ 

Proof. The proof proceeds almost exactly as that of Theorem 12.11 The com¬ 
binatorial arguments are exactly as before, although note the additional 
factor of g in fl3.2p . We also deal with the expectations in the formulas 
for E,ria{G') similarly. A complication arises from bounding the expressions 
EAq(G')X^(G") which occur in E 6 'q(G'); the analog of fl2.9p is that for any 
set of edges A such that |n(A) n n(G')| = s, 


— 1) (hi) ^ A\Xa{G') — 1) < Z^|A|,e(G),s- 


□ 


14 













Remark 3.2. In the case that the edges are independent and v = max^ E(yQ,), 
we find that i'k,v,s = does not depend on v or s. It is now an immediate 
consequence of Theorem I3.il that 


dTvmw),Po{x)) < (1 




)P(G) 




«(G)-1 

+ E 


s=2 


u(G)\ s^k(g,s) ^ 

. s ) (v(G)-s)! /■ 


(3.5) 


Taking the TTa,b to be constants in IS. 51) yields n* = u and recovers the bound 

(S- 


4 Subgraph counts in a graphon model 

The h-graphon model uses 


'^u,v h(f/.[(, 

where h : [0,1]^ —)■ [0,1] is a symmetric, measureable function and Ua, a = 
1,... ,n, are independent U[0, 1] variables which index the graphon; see for 
example PEHSIEI], and [T^ |26] for graphon estimation. In this case edges 
are not independent, but edges which do not share a vertex are independent, 
and we can choose M{u, n) = {1,..., n} and N{u, v) = {u, v} so that g = 2. 
Hence 


fi{G)= dui ■ ■ ■ dUy(^G) JJ h{ui,Uj). (4.1) 

l<i<j<v{G):{i,j)£E{G) 

With 

X:=EW^(f,^p{G)^l(G) 

the weak dependence structure yields the following corollary of Theorem 13.11 

Corollary 4.1. Let 7iu,v = h{Uu, Ufij where h : [0,1]^ —)■ [0,1] is a symmetric, 
measurable function and Ua, a = 1,... ,n, are independent 17[0,1] variables 
and let 

h* = max h{u, v). 
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Suppose that G is a strictly balanced graph. Then 

dTv{C{W),Po{X)) < 2(1 - + h* 

^ fv{G)\ n'^iG)-sQ^*Y^G,s) 


s=2 


(v(G) - s)! 


(4.2) 


where fi;(G, s) is given in fl2.5p . 


Proof. Due to the conditional independence of the edges, for any edge indi¬ 
cator Yij which is not included in Xa{G'), 

P(y,,, = 1|X„(G") = l)= [ dui- ■ ■dn,(G)P(nt = MUv = u,,ve V{G')) 


d[0,l]“(G) 

< h*. 


dui ■ ■ ■ du^(^G)h{ui, Uj 


(4.3) 


Hence 


P(F,, = 1, (^, j) e 24|X„(G') = 1) < (r)l^l (4.4) 

and so i'k,v,s ^ (h*)^ for all v and s. Also, g = 2 for graphon models. The 
bound fl4.2p now follows from applying bound fl3.4p of Theorem 13.11 □ 

Remark 4.2. In the proof of Corollary \4fJ\ we could have replaced fl4.3p by 


j = 1|A'q,(G') = 1) = / dui ■ ■ ■ dUy(^G)h{ui, Uj) 

JlO.lCG) 


'[ 0 , 1 ] 

<E max h(Ui,Uj) 

lUi,Uj-.ij^jev{G) 


(4.5) 


For example, if h{x,y) = ^{x + y) then h* = 1 whereas, using the order 
statistic notation. 


E 


max h{Ui, Uj) \ = ^E(17(„) -f- U(n-i)) = < 1- 

Ui,Uf.ij^jev{G) 2 ^ ^ ’ 2(n(G) +1) 


Similarly, fl4.4p could be replaced by 


P(F,,, = !,(*, j)eA|X„(G') = l)<E 


max TT h{Ui,Un) 

Ui,i&v(G) \^ 


(4.6) 
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While fl4.5p and fl4.6p would yield numerically smaller bounds, h* is easier 
to calculate in applications. 

Example 4.3. In analogy to copulas, where Archimedean copulas have proved 
a useful concept, consider what can he coined an Archimedean graphon: Let 
h : [0,1]^ —)■ [0,1] he given by h{x,y) = where : 

[0, oo) —[0,1] is a continuous, strictly decreasing function which is convex 
on the open interval (0, oo) and^p^~^^{x) = inf{M : ^jJ{u) < x} is its generalised 
inverse. Using the Williamson transform we can write 


where is the c.d.f. of a non-negative random variable R which has no 
atom at zero, see for example fT^ . If inf {x : dF^i^x) > 0} = a/j with a/j > 0 
then 

/ OO 

(l — —'j dFji{t) = 1 — 

„ ^ t I 


' 0,R 


In contrast, minf/,_jg^(G) n(ij)GA+ ^K^j)) as used in (ITOP 

would be more difficult to calculate. 

The next example illustrates how scaling considerations enter in the dis¬ 
tributional bound. 

Example 4.4. Let h : [0,1]^ [0,1] be given by h{x,y) = xy. In this case, 

gives that 


U{G) = 


I du\ • • • dUy(^Q'^ I I 


_^degG.(i) ^ 


i€V{G) 


n 

ieV{G) 


degG(*) + 1’ 


where degG(i) is the degree of i in G, that is, the number of edges in E{G) 
which have i as an end point; 1 < degG(i) < ~ 1- Thus in order to 

obtain a moderate value of X, the graph G has to have a large number of 
vertices with degrees which typically grow like n; such graphs are also called 
dense graphs. In this example, h* = 1 and the bound in Corollary \4.1\ will be 
of the order wW) graph G is fixed. 

If instead we consider the function fn '■ [0,1]^ [0,1]; hfix, y) = n~'^xy 

then the limiting Poisson distribution is not-degenerate and as in (I2.12p the 
bound in Corollary Ifijl tends to 0 with n tending to oo. 
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Finally, we note that the h-graphon model can be viewed as a stochastic 
block model if h is piecewise constant. If 0 = Si < S 2 < • • • < 5q_i = 1, 
where Si = fk-, is a partition of [0,1] so that h is constant on each 

rectangle [sj,Sj+i) x [sj,Sj+i), then we could assign type i to vertex v if 
Uy G The randomness now lies only in the class assignments. In 

this case we recover Theorem 12.11 
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